National Park Time Lapse – Tranquility

Since my last astrophotography road trip in California two and half years ago, I really haven’t spent anytime writing on travel and photography. Amidst the camera project and my PhD, I somehow have accumulated a pile of decent photography yet to be processed or released. But anyway, all those hard work serves to produce better images wouldn’t they. So I took a break in the previous weeks to finish off some of those leftover photo work.

Please enjoy my second time lapse compilation – Tranquility

Included are some of the time lapses I took in Big Bend NP, Mojave National Preserve, Death Valley, Picture Rock Lakeshore and Shenandoah NP. Then there’s also the Jiuzhai Valley in Sichuan, China!

In terms of astrophotography, I only got a few left in hard drive for release. The road trips I cover recently were on the east coast. With light pollution and bad weather along the way, there really weren’t many stars to be seen. Let alone for deep space imaging.

Cygnus

Wide Field Milky Way Center shot in Death Valley

As for 360 panorama, it becomes a routine for me now as the pipeline for 3×6 stitching is well established. In the meantime I start to incorparate the floor image in the stitching process.

Carlsbad CavernsThe WindowWhite SandBig BendPorcupine MountainTybee Island LighthouseShenandoahDeath Valley

Mouse over for location, Click for 360 View

The link to my first time laspe compilation is here:

System on Camera – Part 4 Taming the Signal Integrity

One of the last interface I neglected was the USB3 portion of the Type-C connector. For one reason, the SSD requires Gen2x4 to match the instantaneous data rate. Secondly, FS309 will be running standalone in 1G ethernet or WiFi in the most of our use cases. But in the end, it would be nice if this camera can act as periphery to a computer or transfer data more efficiently through a USB 3. In my part one, I mentioned the GTP on my board can bifurcate as a PCIe x2 plus Type-C x2, or PCIe x4 only. An analog multiplexer controls the direction of the upper two lanes on the GTP transceiver.

To test the signal integrity without a USB3 core, I decided to create an Endpoint (EP) version of PCIe on a separate module. Two cards are then daisy chained with a Type-C thunderbolt cable. And one card provided power through the Type-C PD protocol. The link comes up but stuck on Gen1 (2.5Gbps) speed. Force link training to higher link rate did not help.

To investigate, I decided to implement a Xilinx GTP IBERT core. And unsurprised, at 5Gps the eye diagram is extremely small and not error free.

IBERT eye scan and Bit Error Rate measurement with PRBS pattern

This error rate and eye diagram contrasts significantly from the onboard SSD running at 5G link rate. I did a scan by attaching AXI-Lite to the DRP interface on the running PCIe RC core in the FPGA. In the image below, all four lanes open up to the max height on the eye sampler. It is unlikely my analog multiplexer has any play here.

SSD TX eye seen from GTP side of all four lanes

To isolation insertion loss induced by cable from PCB, we created a Type-C loopback adapter. The plug end loopback at the PCB connector level. The receptacle loopback at the end of cable.

Type-C loopback adapter layout and assembly

Once we attach the plug on PCB, I immediately realized the issue. The tiny EMC beads right before the connector. Possibly due to impedance mismatch, this element induces too much attenuation. Replacing it with a zero-Ohm 0201 resistor immediately recovers the eye. The ESD diodes are preserved though.

Loopback at the Type-C after fix

Loopback at the end of cable, effectively doubling the length

With analog loopback completed, I can now try async reference clock board loopback. This time the eye height is much taller compared to cable end loopback. The width is smaller with more horizontal spikes visible. This is possibly due to the CDR jitter from different reference clock.

Far end eye scan with asynchronous reference clock and post-PCS loopback

Now the link is usable and error free. Let’s again try the PCIe through Type-C cable.

# lspci
00:00.0 PCI bridge: Xilinx Corporation Device 7124
01:00.0 Non-VGA unclassified device: Xilinx Corporation Device 7024
# lspci -vv
00:00.0 PCI bridge: Xilinx Corporation Device 7124 (prog-if 00 [Normal decode])

……………… 


        LnkCap:    Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited
            ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp-
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s, Width x2
            TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt+
………………

01:00.0 Non-VGA unclassified device: Xilinx Corporation Device 7024

………………

        LnkCap:    Port #0, Speed 5GT/s, Width x4, ASPM L0s, Exit Latency L0s unlimited
             ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp-
        LnkCtl:    ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
             ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta:    Speed 5GT/s, Width x2 (downgraded)
            TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
……………….

Gen 2 x2 achieved at first power up. Awesome! Next we can think about a true USB3 core and possibly adding dual lane LTSSM support.

System on Camera – Part 3 Cooling and the First Light

An image is worth a thousand words. Now I have three for the first light on FS309.

This July fourth holiday I made a camping trip to the California’s best dark site (in my opinion). For three nights straight I took five hours total integration per target. Late June to early July brought the best season for southern targets low on the horizon. It’s probably the best time for Rho-Ophiuchus and further below, the cat paw and lobster nebula. The eagle nebula is also great. Since this time I preferred to have a survey than focusing on a single target. I decided to skip Rho-Oph as it requires multiple shots on my scope.

Cat Paw & Lobster

5hr exposure of Lobster and Paw nebula (Click)

There’s some residual greenish background gradient low on the horizon. Hard to tell whether it’s distant light pollution or air glow.

Eagle, Omega nebula and Sh2-46

Eagle nebula (Pillar of Creation) and Omega nebula

Sh2-46 on the top right corner

5 hour exposure (Click for large image)

Sadr, Butterfly and Crescent

2.5hr exposure of Sadr, Butterfly and Crescent nebula (Click)

Pixinsight process with star suppression

And original without star suppression

The last night I tried to catch two more hours for the paw and lobster. It was still low on the horizon and masked before dawn by the Leavitt Peak and sub-peaks to the south. After switching to Sadr region I only managed to get two and half hours of integration. All sub-frames are five minutes exposure and sensor cooled to –10 Celsius. Camera directly connects to my WiFi hotspot and runs intervalometer capture and cooling control autonomously.

Site location

It’s my favorite location for astrophotography and hiking. This site is a Stanislaus national forest campground at CA-108 Pacific Crest Trail junction. At 3000 meters elevation the air is thin and Zenith is roughly Bortle 1.5 scale. During the summer season the sky here are constantly clear and seeing is great. CA-108 is a much more pleasant drive compared to CA-120 Tioga pass through Yosemite, where you’d expect lots of people at Tuolumne Meadow. Highway 4 Ebbetts pass is covered by trees at a lower elevation and road is narrow. The campsites are on an elevated hill so through traffic will not shine into my telescope. No permit required and lots of alpine lakes, hiking trails nearby during the day. Water can easily be accessed from the creek. This isolated location is far away from the light pollution of Tahoe to the north and sierra to the west blocks light from California’s central valley. This location is four hours from bay area and make it perfect for a weekend excursion.

FS309 on 70SA astrograph after my last night of imaging

A ten-day long heat wave over the west during July 4th

Cooling Chamber

USB-C, Ethernet and 12V battery/DC (Will be removed for Type-C PD)

A reserved HDMI port, tested up to 1080P60

A new mechanical design balancing heat dissipation and cooling power. Only a single layer TEC is used and we found out it is sufficient for 30 degree of maximum delta-T. The cooling chamber is purged of moist before sealing. So far I have not observed any condensation.

System on Camera – Part Two

Again, I broke my promise. What’s meant to be a week turn into a few months before another post from me. But in the months between there’s a lot of completed work on both in hardware and software. This camera also lived up to its task during the 2024 solar eclipse. Of course that event itself should deserve its own separate post.

FS309 at 2024 Total Solar Eclipse

Now with all that behind me, I will start update more regularly. And I’ll try to keep each post succinct. For this one, I will showcase some software enablement.

RTOS

One of the main storages is the SSD through the PCIe Gen2 x4 link on the PL GTP side. However, the Zynq A9 is struggling to service the MSI interrupts and file system stack. In Linux without the host accelerator IP, SSD read/write can only achieve ~100MB/s. It’s an old scaler CPU core at 666MHz clock. This is unacceptable for burst capture during solar eclipse. So I migrated Shane’s baremetal NVMe driver into Zynq A9. A quick IO test shows a PCIe Gen3 SSD can easily saturate a Gen 2×4 link in both directions. At 256KB per write command, speed is around 1.5GiB/s.

xpg_8200_ssd_7015

To make fast capture possible, the SSD is migrated into a task in freeRTOS. The sensor control task will notify once a frame is ready. Furthermore, I had to enable the MSI interrupt so the SSD task can yield right after IO queue submission. A further optimization is done by interrupt coalescing. A single frame would cost around 320 block writes. If each IO generates an interrupt, that would be 4800 at 15 frames a second. A feature in NVMe protocol provides a delayed MSI interrupt generation per IO queue. An interrupt will be delayed by a maximum time entry or when number of IO completions is reached. Last November I made sure SSD could cooperate with the IMX309 sensor in a live test.

nvme_309 

IRQ and time cost

In the above debug log, entry and exit time between each task yield is accumulated and the total time printed out once a frame is written. At 15FPS one second capture, the SSD task consumes around 100ms. This is about 10% of CPU usage of a single core. Notice the printing delay from ARM JTAG DCC compared to a normal UART. I ran out of MIO on the PS side for UART.

In the next few months I removed the LWIP task and opted for a lightweight UDP task. The main goal is to shrink the memory footprint and make the entire RTOS running inside the OCM and L2 cache. The entire DDR range is marked as non-cachable device memory and used for DMA buffer and descriptors. This reduces open pages on the DDR controller since our bi-directional read/write bandwidth is approaching 3GB/s. The final ELF plus program memory fits within 128KiB.

    canvas

5 x 5 panel of stars for focal plane adjustment and focusing

Linux

With solar eclipse expedition a huge success, I had more time improving the Linux side recently. One of an interesting feature I came up with was a star mosaic panel from the entire FoV. An image is divided into N x N sections, 5×5 in this case. The brightest star from each section was selected and filtered with centroid algorithm. Then the local 32×32 is sliced out into a mosaic. This is great for focal plane tilting adjustment, optical alignment and focusing. I have to improve the algorithm to O(N) time complexity in the first step because the image is so big. The following centroid is based on the multi-pass center of mass. In the above image, you can clearly see the residual optical aberration for stars leaving the center of field. And the faint tail pointing towards the center.

The centroid algorithm is relatively fast, consuming 30% of a single Zynq core at 2Hz update rate. And this image is pushed to my Android phone in real time. The front end Javascript rescale and zoom the image to better visualize on a phone screen.

 

A lot of time is spent on low level driver and bring up of each hardware subsystems. The WiFi/BT module, assigning GPIOs in device tree and even the IO expander through I2C requiring its own driver component. Then there’s the power management, power button and properly shutting down the operation system requires effort.

USB-C PD

Notably I had included type-C power delivery function on this so a single PD hub can power the whole system and add more USB ports just like your laptop. Luckily the CC-PHY driver is available in upstream kernel with basic PD and OTG functionality.

PD negotiated to 20V sink and USB host detected a SD card

When powered from an internal “battery”, it can also source power to my hub with 5V 3A. But this will unlikely be a use case in an astronomical camera. For these kind of debug, I prefer a bare board to ease firmware brick recovery.

TEC and Cooling control

Cooling is another essential for deep sky imaging. I’ll cover the hardware part in the next series. But for close loop software it is through PID. The temperature sensor is either a thermistor directly measured by FPGA’s built-in XADC. Or temperature sensing IC through I2C bus. Either way the component is on the main CIS PCB or in the CIS itself.

TEC can be controlled both by voltage or through PWM. The voltage is controlled through a variable voltage buck converter with current monitoring. For PWM it is through the Zynq PS’s TTC module.

INDI and Astrometry

Back to the user space, my collaborator help me migrate both the INDI and astrometry.net plate solver into Linux. For the latter the entire Makefile is rewritten to make it portable in buildroot. Anyway, I do not need the fancy drawing stuff except the solver itself for pinpointing the stars.

System on Camera – Part One

This series will be the culmination of many years of my work. I am presenting this all-in-one astronomical camera platform solution. The goal is to have a camera capable of both low noise deep sky imaging and high performance for astronomical events like occultation and eclipses. With an application processor built in, it will enable complex capture scripting and some preprocessing without the need for an external PC. And at the same time provide a myriad of connectivity to mobile devices for initiation and status monitor. I’m also happy to announce that the hardware will be available at close to its cost (BOM, labor, testing). A plus side is that all the Linux kernel driver, software and bootloader will be open source to the buyer joining our community.

Previously I had a plan to design around SoCs like the Rockchip platforms. But limitations in its ISP and data paths proved to be way too cumbersome. FPGA still provides the flexibility that we want. And in some cases designing the data paths to suit mission requirements.

In this part one, I’m going to introduce this platform, its hardware components and sensors supported.

In this above flatten connection, I have the IMX309 module as the target sensor. The peltier cooling element and its power controller are connected to the main board. Next let us focus on the motherboard, which provides the most versatile configuration options and connectivity.

On the front of the main board, we have a Xilinx Zynq 7015 system-on-module. Like all Zynq 7000 devices, the maximum memory capacity is 1GB. An onboard eMMC for ROOTFS and user storage is also included. The following are the connectors.

  1. Three FPC cable connectors for image sensor interfaces. Each connector fans out 22 pairs of differential signals from the same IO bank. Normally the IO standards supported are LVDS, subLVDS and SLVS(-EC). Each pair can also split into two single ended signals for sensor configuration and power control.

  2. A micro-HDMI connector. I have already successfully run a test pattern at 1080P60. Signal traces are protected with ESD and monitor information can be probed by onboard I2C bus.

  3. TEC cooling. This connects to the peltier power regulator board. Power runs at 12V and is controlled by combination of PWM and direct voltage/current setting through the I2C bus.

  4. USB Type-C with PD control. The 2.0 trace connects directly to Zynq SoM onboard PHY with OTG capability. Variable voltage modes from 5 to 20 volts are supported through CC pins. This can be a power supplier or consumer. The entire board including TEC can run on a typical laptop type-C power adapter. Type-C hubs can also be used. The two pairs of superspeed TX/RX traces are shared with PCIe through passive MUX. The superspeed support is still pending RTL development. In the future, you could run PCIe 2-lane and USB 3 Gen 1 x2 lane for fast transfer to PC.

On the back side, we have a Gen2 x4 PCIe in M.2 key M connector for SSD. I have achieved 1.5GiB/s sequential write bandwidth under a RTOS system. Additionally a 2.4G WIFI/BT combo module using Zynq PS’s SDIO interface. Additional connectors includes:

  1. A 1000Base-T ethernet interface for Zynq PS

  2. Lithium Ion battery cable connector supporting 1~4 cells. A single cell max out at 4.2V fully charged. Default option using three-cell serial or 12V for TEC efficiency. Type-C PD can charge this battery directly

  3. A power connector for the image sensor module

  4. JTAG debug interface

Given the flexibility of reconfigurable IOs, this platform supports a variety of imaging sensors. A total of three 22-pair IO banks even allow some high pin count large format scientific CMOS. In the end, your limitation will be DDR3’s memory bandwidth. At this time of writing, I have the following working.

  1. IMX309 – Linux kernel driver ready and RTOS

  2. IMX235 – Linux kernel driver ready and RTOS

  3. IMX410 – RTOS only. Not recommended for astrophotography due to the presence of PDAF pixels. IMX410CQK version requires a new PCB design

  4. GPixel GSense4040FSI/BSI – RTOS only

  5. GPixel GSense5130 – RTOS only. Unfortunately I was notified by GPixel that they discontinued this sensor

Other sensor modules in development will be added to this list in the future.

GSense4040

Several use cases for high performance capture including time domain astronomy. I plan to deploy this camera for the 2024 total solar eclipse. No other cameras out there can achieve the same dynamic range and capture rate compared to direct NVMe sink of a multi-megapixel imaging sensor at maximum frame rate.

Next week, I’ll discuss power consumption, bandwidth performance and some aspects of software. Stay tuned.

A prototyping cooling casing

2.304Gbps SLVS-EC IP using Ultrascale+ HPIO only

Two years ago, I successfully decoded the SLVS-EC protocol and made a working receiver IP block on the Xilinx 7 series HRIO. The speed is limited by the 1250Mbps on Artix IO banks or 1600Mbps on a Kintex 7. I can overclock a bit to 1.4Gbps but it’s time to push it to its full potential. Ultrascale+ devices have been in numerous products for a while now and their cost has been driven down. The HPIO banks in Ultrascale generation are capable of 2666MT/s in DDR4 memory interface. With a differential SLVS signal, this should perfectly satisfy the 2.3Gbps.

From Xilinx UG571

Each RX_BITSLICE in native mode contains a deserializer, a tap delay line and a IN_FIFO. Two adjacent RX_BITSLICEs are reachable from a differential IO buffer and each nibble group has three pairs

The catch however, is always in the detail. At 20nm Ultrascale, Xilinx completely overhauled the IO architecture. To reach speeds above 1.25Gbps we have to use the “Native” IO mode instead of the “Component” mode backward compatible with the 7 Series. It will only support 1:8 deserialization with 1:4 under a pseudo SDR mode. Ratios like 10/14 and 4~7 are dropped to reduce routing delay from reconfiguration. It also forces you to include the IN_FIFO/OUT_FIFO normally used in memory IO bursts. The delay line is hugely improved to 512 taps with greater than 1.2ns total delay. But its control is much more tedious in “Time Mode”. It appears the same architecture is implemented in the new 7nm Versal devices as well. And the speed is pushed beyond 4266MT/s for LPDDR4.

Internal 1D eye with sampling location (‘x’ is rising/falling edge, each char is 16 taps combined)

Eye measured with a differential probe on MSOX6004A (UI == 434ps, Vertical 200mV/DIV)

After spending two weeks migrating and tuning a new phase recovery algorithm, I was able to successfully receive video frames from a IMX410BQT sensor at 40FPS in 12bit all pixel scan mode. There’s no 10b/8b decoding error indicating a robust CDR algorithm. Initial eye scan only takes 66us and eye keeping can be turned off as a feature if user desired.

Curiously, I have not observed the phase shift from rising system temperature like on my Zynq-7000 module. I wonder if it was caused by the voltage translator IC on my carrier card. On my newly designed XCZU4EV devkit, the oscillator reference clock is directly fed into HPIO bank.

Only one HPIO bank is used for a MIPI camera connector and IMX410

Other features including full fledged type-C with PD, DisplayPort and USB3.1 OTG, M.2 SSD and WiFi/BT

In conclusion, this SLVS-EC receiving IP does not rely on dedicated multi-gigabit transceiver blocks like those provided from Framos or Macnica. It is much more power efficient and enables low cost FPGAs like ZU1~3. It will leave your GTH for true high bandwidth bidirectional traffic like 40 Gbit fiber network or PCIe SSDs.

ip_block

post_route

Post route resource in ZU4EV device. Green highlights SLVS-EC blocks and Yellow indicates VDMA block

If you’re interested, please leave a comment with your contact information for any questions related to IP usage, design suitability and licensing.

Sony a7S III has a 2×2 pixel binning IMX510 BSI sensor

In 2017 ChipMod took a microscopic image of the IMX235 sensor from a Sony a7S II camera, showing its very large opening of pixel photodiode. Then in early 2020 we successfully interfaced with our custom FPGA board. Now with the third generation released, we want to know if the BSI model has improved the image quality.

在2017年,我和ChipMod获取到Sony a7S II相机里IMX235传感器的显微照片,看到其像素光电二极管相当大的开窗尺寸。2020前期我们成功在自己的FPGA开发板上与之接通成功。如今新的第三代已经发布,我们想知道背照式是否真的提升画质。

Read Noise

Read noise in ADU chart from photonstophotos.net

Judging from the read noise chart, the third generation BSI performs half to a full EV worse compared to the second generation FSI sensor. The performance only picks up slightly better after a7S III switches to its high gain mode after ISO 1600.

从读取噪音来看,第三代的背照反而比第二代正照传感器要高上0.5~1个EV。直到ISO 1600以后,第三代开启高增益模式后才稍微比二代好一些。

DXOMark

Then again the DXOmark shows it performs a third stop worse compared to its second gen. Please note that at 18% raw gray scale the noise is pretty much dominated by photon fluctuation or shot noise. Thus SNR18 usually reflects the total quantum efficiency. From the chart we can see the IMX235 FSI performs almost as good as a IMX451 BSI sensor from a7R IV when viewing from the same size. This shows how amazing a large pixel can be optimized to maximized light collection. Now the question is what happened underneath the pixels to cause such a step back in imaging quality.

然后DXOmark的数据也显示三代比二代在18%灰度信噪比上性能也逊色1/3档。请注意在18%的灰度下,主要噪音来源是光子涨落噪音,也就是散粒噪音,这个性能指标主要表现的是传感器的总和量子效率。从图表来看,正照的IMX235性能几乎接近背照的a7R IV IMX451。可见优良的设计可以让大像素充分接收光子。现在的问题是,到底是什么像素设计导致了三代反而在画质上落后于前一代。

We recently got an a7S III sensor module damaged from laser light show. It came with flex cables and an image stabilization (IS) platform. In the image below, the right two flex cables are connected to the sensor module. These contains CMOS Imaging Sensor (CIS) power supply, driving signals and 8 pairs of SLVS-EC pixel data channels. The leftmost cable controls the coils for IS module.

近期我们获得了一个被激光秀损伤的a7S III的传感器模组。模组已经带了软排线和图像稳定平台。在下面的照片中,右侧两条排线连接图像传感器模组。里面包括了CMOS传感器供电、驱动信号和8对SLVS-EC图像数据通道。最左侧的排线控制着防抖系统的线圈。

A7Siii_module_back

A7Siii_module_front

We then removed the sensor module from the IS platform. These flex cables are high density 0.2mm pitch. Unlike previous Sony CIS modules, this one no longer has a center PCB cutout for direct thermal relief to the metal plate. We have to remove the PCB using a rework station to reveal the CIS part number.

我们稍后从防抖平台拆下传感器模组。这些排线为0.2mm引脚间距的高密度线。和之前的索尼传感器模组不同,这次模组没有电路板的中央散热开口。我们只能想办法用返修台移除电路板来看CMOS型号了。

IS-1036

A7Siii_CIS_removed

The CIS part number is IMX510AQL in a 294-pin LGA ceramic package. The same package shared with a7R IV’s IMX451AQL 61MP sensor. But judging from the PCB traces the pinout definition is different. There were rumors saying IMX510 was going to be a new 32MP APS-C sensor. This just shows the leaked specs without proof are so unreliable and inaccurate.

主传感器的型号是IMX510AQL。芯片采用294引脚的LGA陶瓷封装。该封装与a7R IV的IMX451相同,但是从电路板走线看采用了不同的引脚定义。之前有谣言说510是一颗新的APS-C画幅3千2百万像素传感器,看来此类没有对证的所谓泄露出的性能参数是多么的不靠谱不准确。

IMX510AQL_identified

Let’s heat remove the cover glass and inspect the pixels under a 50x microscope objective. It turns out this sensor was a 2×2 binning design. This means IMX510 actually has a 48MP native resolution. The RGGB Bayer pattern is spread across a 4×4 grid. After sensor readout, the four pixels in each pf the same color are then combined digitally to give one pixel before sending out on the SLVS-EC interface. This could explain the increase in read noise. From my knowledge, none of Sony DSLR CIS supports charge binning due to limitation in its pixel architecture. By combining four pixels digitally, you would increase the noise variance by four and hence read noise almost doubles (sqrt to RMS). The bright green pixels are phase detection pixel for hybrid AF system.

我们通过高温拆除了传感器的封装玻璃,然后用50x显微镜物镜检查像素。出乎意料,这个CIS居然用的是2×2 binning像素。也就是说IMX510本身是一个四千八百万像素的传感器。RGGB的Bayer阵列散布在4×4间隔上。在每个像素读出后,同颜色的4个像素在数字相加后再由SLVS-EC接口输出。这就可以解释为什么读取噪音增加了。由于Sony单反传感器的像素架构不具备电荷域binning的能力,读出后数字相加导致方差变4倍,读取噪音均方根为2倍。亮的绿色像素是自动对焦系统的相位检测像素。

A7S3-50xIMX235 50x

IMX510 (top) 2×2 binning pixel vs FSI single pixel in IMX235 (bottom), both under 50x objective

IMX235 has Bayer and microlens removed showing its top metal layer

IMX510上图中2×2像素,对比下图中IMX235正照像素,均用50倍物镜拍摄

IMX235的Bayer层和微透镜被移除,可见顶层金属层

So the final question is why Sony went down this design path. I came up with two possible reasons. 1. Sony already has a BSI pixel design fitting this 4.2um pitch requirement. A 2×2 binning is a lot faster to reach market then starting off a new 8.4um pixel. Since most pixel design layouts are fixed, scaling the area can make multiple sizes of chip. For example, IMX411, IMX461, IMX455, IMX571 and IMX533 are all based on the same 3.76um BSI pixel design but each cover a different imaging circle from medium format to 1-inch. 2. Sony try to emphasis the HDR video capability on a7S III. A single pixel is limited in dynamic range. But you could read out each of the four sub-pixels with a different gain or exposure time. And later weight combining them digitally for the final value. Such method is used on many Sony security sensors. IMX294 and IMX482 also employ 2×2 binning BSI design.

至于Sony为何采用这样的设计,我认为可能有两点原因。1. 索尼已经有一个4.2um宽度的背照像素设计。重新设计一个8.4um像素需要时间,而直接采用2×2 binning方式能够更快到达市场。大部分的Sony CIS设计沿用相同的像素架构,通过改变总像素来实现不同大小的芯片。比如IMX411, IMX461, IMX455, IMX571, IMX533这些沿用相同的3.76微米背照像素架构,但画幅满足从最大的中画幅到最小的1英寸画幅。2. a7S III主要强调HDR摄影能力。单个像素能做到的动态范围是有限的。而拆分4个亚像素后,可以通过不同曝光或不同增益读出,后期数字处理来增加动态范围。这种方法在一些Sony的监控CIS被用到,例如IMX294和IMX482。

Regardless of imaging quality, the third gen has a huge improvement in readout speed due to its BSI architecture. After all, this camera is mainly aimed for cinematographers. Its all-pixel scan rate has drastically increased from 30FPS to 90FPS. And 1080P60 no longer needs subsampling like in IMX235. Engineering has always been a balancing act. But still, it would be great to see a single large BSI pixel without microlenses achieving sCMOS grade quantum efficiency.

不过无论画质,第三代通过背照工艺实现了读取速度性能的极值提升。毕竟a7s III主要的目标是视频摄影,它的全幅读取速度从之前的30FPS提升到了90,而且1080P60模式也不需要像二代IMX235那样进行亚采样。工程永远是一个取舍的过程,不过不论如何,还是希望能看到一颗超大背照像素,不通过微透镜就能实现科研CMOS传感器那样的量子效率。

Decoding the SLVS-EC protocol from IMX410BQT

In my D850 hacking post I had mentioned another Sony sensor IMX410BQT. It bares a similar IC packaging like the IMX309AQJ. The connector shared the identical pinout. We suspect it must be coming from the Nikon D780. Recently the ChipMod workshop had a laser damaged sensor from a Z6 mirrorless camera, with the sensor baring IMX410BQJ marking code.

IMX410BQJ version (Nikon Z6) on the left and BQT (presumably D780) on the right

The PCB layout is 100% identical in between. On the J package, a molded plastic frame facilitates mounting to a optical stabilization device.

We decided to plug in the BQT version on to the Z6 camera and it works!!!

We prove both packaging are electrically and functionally identical

Signaling

The only difference to the IMX309 resides on the high speed data outputs. IMX410 only uses SLVS lane 0~7. The DDR clock pins and data lanes 9~15 PN pins are all shorted to ground instead. Lane 8 is connected on the sensor PCB side. But its negative pin is shorted to the ground and its positive pin left floating on the Z6 flex cable. Based on these, IMX410 is definitely running the SLVS-EC protocol. The EC stands for embedded clock. Similar to PCIe Gen 1 and 2, data is packed and coded using 10b8b protocol before going out to the PHY layer. This makes the signal DC balanced and guarantee sufficient transition for clock recovery.

Unfortunately, SLVS-EC is a proprietary protocol nor as popular as MIPI D-PHY. Detailed information regarding its packet format and encoding is scarce. There aren’t many SoCs with open datasheet out there supporting SLVS-EC sensors. On the FPGA side, there were several IPs supporting SLVS-EC protocol. But all of them are using the gigabit transceiver and require hefty licensing fees. My microZed Zynq 7010 only has HR IO. And my Zynq Ultrascale+ ZU4EV only supports up to four lanes on its GTH. None of these appear optimal solution to me.

However, since Vivado 2019.1 Update 1, Xilinx finally certify its Ultrascale+ HPIO for data rates up to 2.5Gbits in D-PHY. D-PHY in high speed mode is in fact running SLVS IO. This opens the possibility to interface up to three 8-lane sensors per HPIO bank. And let’s save the GTH transceiver for real full duplex multigigabit applications like PCIe Gen3/4 or 40Gbit Ethernet.

Clocking

To start off, I decided to work on my ver.2 PCB at reduced frequency. If frame rate is not a concern and I am mostly running on the slow 14bit ADC, data transfer won’t be a show stopper here. Additionally, SLVS-EC also has a 1152Mbps line rate other than the default 2304Mbps if I managed to find the PLL registers.

Another trick is to make the FPGA synchronized to the IMX410 sensor so I can avoid clock data recovery altogether. This is similar to PCIe where both root complex and endpoint are running on the same external 100MHz reference. On the Nikon IMX410/309 module, the 72MHz oscillator output has a T-branch. One end goes into the CIS and another to the connector. This connector pin was left floating on both Z6 and D850 flex cable. So I feed this clock pin into a FPGA MRCC pin. If I do not enable the oscillator during the power on sequence, its output will tristate. This make it possible to drive a clock directly from a FPGA at reduced frequency.

Normally the sensor is running 72Mhz x32 PLL multiplier giving a 2304Mbps output. I reduced it to 40Mhz initially making 1280Mbps well within the reach of HR IO on the 7 series FPGA. The MMCM will generate a 320Mhz BUFIO clock and a 160Mhz fabric sampling clock.

Line period is 6.53us on Z6 during liveview and 12bit still shooting mode. In 14bit mode, the line rate is reduced to 12.65us. Both these timing periods are further extended by 80% to account for reduced master clock.

Using the Version 2 IMX309 carrier card for IMX410BQT

Protocol Analysis

For decoding, I configured both ISERDESE2 receivers on each IO differential pair to 4bit DDR mode. The positive and negative end of ISERDES will first scan the transition edges by one IDELAYE2 tap difference. Once the center of the eye is locked, the negative end will shift to the next eye location, making both ISERDESE2 working in QDR mode under a 320Mhz BUFIO clock. Combining odd and even bits will generate 8-bit deserialized data and directly feed my DMA in a logic analyzer mode. In 600ms it will capture 768MiB on the PHY layer.

When I’m not generating XHS/XVS pulses, the sensor will output a constant repeating pattern. 10’b0110001011. Checking the 10b8b encoding table, this corresponds to D00.0. This is the IDLE code when SLVS-EC transmitter is active. When XHS/XVS are regularly pulsed, IDLE code will fill in the blanking space between data packets. This means I could use D00.0 as word alignment key giving its repeating nature. After some analysis, I found the packet boundary closely matches the Table 2 from this Microsemi document.

Start Code K.28.5 – K.27.7 – K.28.2 – K.27.7

End Code K.28.5 – K.29.7 – K.30.7 – K.29.7

Pad Code K.23.7 – K.28.4 – K.28.6 – K.28.3

Deskew Code K.28.5 – 0x60 – 0x60 – 0x60

On the PHY layer, each packet begins with a Start Code and stops with a End Code, then it’s immediately followed by a Deskew Code. Between the start and end codes is packet data with some intermittent Pad Code. These Pad Codes are randomly inserted and should be ignored on the PHY layer decoder. The location of these Pad Codes are probably due to frequency mismatch between XHS pulses and internal SLVS-EC transmitter.

IDLE .. IDLE – Start – Packet Data – End – Deskew – IDLE .. IDLE

After 10b8b decoding and getting clean 8-bit data, packet data are byte-wise distributed over eight data lanes. After assembly, each packet begin with a 24-byte header. On IMX410 this header is 8 bytes of worthy information duplicated three times for redundancy. Within this 8-byte block are defined as follows:

Bits

1

1

1

13

1

31

16

Field

SOF

EOF

Line valid

Index (1..8191)

Embedded

Reserved (Fixed to 0)

ECC

The index value begins at 1 and are reset at the first XHS after every XVS pulse. The first few rows before the effective pixels are embedded data. From my observation of IMX410, the reserved fields are set with zero. The ECC field is a 16-bit XOR between the selected bits in the 48 MSB of 8-byte header. I have decoded the following bits except zero fixed reserve fields.

Bit    Field    ECC bits affected
31    EBD     0b1000000000111111
32    IDX0     0b1000000001111011
33    IDX1     0b1000000011110011
34    IDX2     0b1000000111100011
35    IDX3     0b1000001111000011
36    IDX4     0b1000011110000011
37    IDX5     0b1000111100000011
38    IDX6     0b1001111000000011
39    IDX7     0b1011110000000011
40    IDX8     0b1111100000000011
41    IDX9     0b0111000000000011
42    IDX10     0b1110000000000110
43    IDX11     0b0100000000001001
44    IDX12     0b1000000000010010
45    Valid     0b1000000000100001
46    EOF     0b1000000001000111
47    SOF     0b1000000010001011

A bit-1 in ECC bits indicates that particular ECC bit includes XOR of the input bit

Following the 24-byte packet header is the payload data organized in 224 + 4 bytes chunks. The 224 bytes carries packed pixel data and a four-byte ensuing parity checksum for this 224-byte chunk. The last chunk might not reach 224 bytes long but also has a four-byte checksum. All the data chunk should be concatenated to form packed pixel stream for each row.

The packing method is identical to MIPI D-PHY. Each packed block should be 8-bit aligned. 12-bit RAW are packed 2-pixel 3-byte and 14-bit RAW are packed 4-pixel 7-byte. The MSB 8-bit are filled first for these pixels followed by MSB justified concatenation of LSBs.

p0 = data[:,:,0] << 6 | ((data[:,:,4] >> 2) & 0x3F)

p1 = data[:,:,1] << 6 | (((data[:,:,4] << 4) | (data[:,:,5] >> 4)) & 0x3F)

p2 = data[:,:,2] << 6 | (((data[:,:,5] << 2) | (data[:,:,6] >> 6)) & 0x3F)

p3 = data[:,:,3] << 6 | (data[:,:,6] & 0x3F)

Numpy soft decoding of 14-bit RAW data

Decoded Image

Now I finally have valid data to demonstrate successful image capture. The image size is 6104 x 4234 in still capture mode. Read noise is low and no visible row ripple from the power supply on my carrier card.

My next goal is to improve signal integrity and increase readout clock frequency.

Differences to IMX410AQL version in Sony A7III

Sony also employed IMX410AQL in their latest A7iii mirrorless camera. In the beginning I tried using the SPI configuration sniffed from a Sony A7iii for the BQT version. The SLVS transmitter turned on and the sensor responded to XHS/XVS pulses by generating training sequences with specific register writes. But the BQT version never outputs a valid image except empty data. Clearly the BQT version is customized and require some other private configuration.

With the SPI configuration from an actual Nikon Z6 camera this BQT finally works. I did compared the final register setting between Sony and Nikon. A lot of them differ. But all the functional setting registers (mode, analog/digital gain, shutter, ROI, etc) share the same address. This CIS sensor might have an one-time-programmable area burnt with other driving settings customized by Nikon at the fab.

Both Nikon Z6 and Sony A7iii also have on-chip phase detection pixels. Sensitivity compensation for those pixels are done in ISP. Thus with direct readout, I should be able to see those pixel when lens cap is off.

Repeated horizontal lines from the blue Bayer channel (12bit silent still)

Repeated bright row very 12 rows (6 here when a single Bayer channel extracted)

Usually phase detection pixels have 50% of their opening masked. And they should be dimmer, not brighter. To find out why, I asked the ChipMod lab to investigate under a microscope.

It turns out the blue pixels are replaced with green dye at phase detection pixels. And for most light sources green channel has more photons. It makes sense for focusing elements to get more light.

A Bayer removed IMX410BQT under 40x magnification

I wonder how BSI focusing pixels had their masks implanted. Unlike the FSI pixels you can have the metal one layer half opened.

Phase detection pixels on IMX410AQL shows a more irregular pattern

Nikon’s phase detection pixels stride every 2 columns and 12 rows regularly. Sony has a more irregular pattern for every 4 columns and 6 or 12 rows in between. There’s another version on Sony’s website – IMX410CQK, which probably doesn’t have any phase detection pixels.

Update 2022

Working SLVS-EC RX IP Block

image

With all the decoding method and information in place, I created a working receiver IP for this protocol on Zynq 7010 HR IO banks. The key to success here was to continuously track the best eye location by doing phase recovery. A second pair of IDLEAYE2 and deserializer path was instantiated to monitor the boundary of transitional edges and adjust the sampling location. This way I could counter the temperature/voltage related phase change in the reference clock. This design works at Xilinx’s specified maximum frequency of 1.25Gbps. I actually successfully over-clocked to 1.5Gbps on a cheap Artix-7 based IO architecture!

image

Captured and post-debayered image from my working receiver IP

Signal Integrity

3FD[[C]3)XQR]%_}7C19XLC

The above eye diagram was 720Mbps SLVS signaling from a IMX309. The on-chip differential termination resistor works well for 200mV common mode. And no external resistors are required on PCB. The differential swing and common mode voltage are the same between SLVS and SLVS-EC. However the EC variant observed a pre-emphasis spike on the eye diagram at rising/falling edge. This might be due to the particular setting for impedance matching on Nikon’s flex cable. I suspect there might be undocumented registers controlling the SLVS-EC transmitter PHY layer within the Sony image sensor.

On the new Ultrascale/+ architecture, HPIO banks can be clocked at much higher frequency of 2.5Gbps for POD12 and MIPI-HS. The MIPI HS is in fact a true differential driven by constant current source. I suspect the LVDS can also be driven at 2.5. The only reason MIPI requires a VCCO of 1.2V is to accommodate LVCMOS12 during LP operation. The internal termination might be compatible between LVDS18 and MIPI-HS12.

1666850978(1)

Porting my IP from a 7-series IO interface to the Ultrascale+ yield exciting results. The eye diagram is successfully scanned at 2.304Gbps full speed. And the eye opening is much bigger than HRIO on Zynq 7010.

IMX235, IMX071AQE and Foveon F20A

In early May this year I have extended the work on IMX309 to other Sony sensors. These sensors share a similar serial data protocol. IMX235 is in the heart of Sony A7S and A7Sii. It has a large pixel making the fill factor better compared to smaller pixel design at the same technological process. In fact, the 18% SNR curve puts this sensor comparable to the Nikon D850’s back-illuminated CMOS.

Bayer stripped IMX235AQR from Sony A7S, a new cover glass installed by ChipMod

There were two packaging versions, AQR and AQL. Both share the same LGA pinout. The latter has integrated mounting frame to minimize dimension required in a 5-axis image stabilization system. We compared the SPI register setting and both are the same. The interface is a 12-lane sub-LVDS running at 432Mbps DDR. This sensor uses standard ITU sync code with SOL/EOL 4 word sequence (FFF 000 000 XXX). This sensor has similar dual row readout like KAC-12040. The odd and even rows are distributed between two 6-pair data lanes.

Even rows – 0 7 2 9 4 11

Odd rows – 6 1 8 3 10 5

subLVDS lane distribution in 6 x 2 pixel blocks

There is lane multiplexing going on at different operating mode. Live preview is only using 4 lanes. 4K video and still capture is using all 12 lanes. As for bit depth, still and magnified mode is using 14bit where line rate is limited by ADC readout. 4K video and silent still are running on 12bit all-pixel-scan. In the electronic first curtain, row reset sweep is synchronized using a external signal on the sensor XPI pin.

XPI sweep after the first XVS pulse at the same rate mechanical curtain travels

The charge reset alone is much faster than ADC readout. On IMX309 the XPI pin is not used. Instead EFCS sweep is done internally using a series of register settings defining vertical segments and progression speed.

PCB layout with main power supplies

Flex cable connection between the sensor module

Integration with A7S chassis and microZed module

Mounting a Nikkor 50mm lens with adapter

With the rest of control logic and pixel reorganization in place, I can stream video quickly with exposure control.

14bit rolling shutter with vertical readout cropping (click to expand)

IMX071AQE

Pentax K-5 and K-5II employed this sensor in a ceramic package. The silicon die looks identical. The MSB lane (7th bit) is not used and the data stream is sent in parallel mode with sub-LVDS lanes 0~6. The difference in register setting between this and IMX071AQQ is minimal. I actually tried the AQE register setting on the AQQ variant years ago. But the output remained the same serial format. It appears the lane setting might be burnt into some OTP memory at factory.

Same PCB with the other connectors for Pentax K-5

I’m not going into the detail as this one is identical to AQQ in D5100/D7000. Even the width of each row read is the same – 5040 pixels wide. The synchronization sequence consists of four pixel words and some non-standard code. The first line has the last word as 000E/000A. This indicates a start of frame sequence.

SOL: 226E – 3715 – 0A84 – 000C

EOL: 026E – 3715 – 0A84 – 0008

Even though this version employs parallel data transmission, the pixel output range is still capped at 0x3FFE just like the serial AQQ version.

Foveon F20A

Another sensor of interest is the unique Foveon X3 design. Photons of different energy (wavelength) are absorbed at depth with different probability. Foveon employed a special silicon process to manufacture three layers of photodiode in stack fashion. Thus color information can be deconvoluted from the varying intensity between the three channels. This method unleash huge improvement in spatial resolution. If I can make charge binning before readout possible (might be possible with some pixel driving hack but very difficult), it possible to make this a color and monochromatic camera at will!

Before the Merrill generation, the original Foveon sensor 4.7MP F13 (Part number FX17-78-F13D-07) is available for sell and FAE support. But after some research I found it lacks integrated ADCs. And the temporal noise is high lacking true CDS readout. Also each layer has dedicated amplifier making the charge binning across depth physically impossible. This also makes pixel design very complicated. There are 13 transistors in each pixel. Quantum efficiency is relatively is high at peak but falls of rapidly as wavelength goes into red or blue.

Chipworks teardown

Foveon F20A teardown from Chipworks

The Merrill generation is a step forward. All three photodiodes shares the same floating diffusion with dedicated charge transfer gate. 6T pixel design increases fill factor and increase pixel density. F20A also has integrated ADC.

But since 2008 Foveon is part of Sigma and I do not think they will sell this sensor to third party anymore. So I took out my hacking skill once more. I picked a second hand DP1m camera as it is cheaper compared to the SD1 Merrill. They all used the same sensor but DP1 also has liveview and zoom functionalities. A big plus for reverse engineering.

 

This camera has good layout. A single flex cable supplies all the power current and control signals to the sensor. A five-lane differential signal transmits the pixel data. The sensor connector is 29 pins at 0.5mm pitch. The main PCB is a rigid-flex design to save some space.

Sigma TRUE II ISP/SoC with 256MB DDR2 in 32bit single memory channel

The firmware should be contained in the NOR flash memory. The back side of this PCB are populated with huge passive components like power inductors from the two integrated DCDC regulator ICs. DP1m will take four calibration frames after the first shutter release. The SDRAM is pretty limited on this camera.

Removing the sensor flex cable revealed existing test points on the sensor PCB. This is exactly what I wanted! After some probing, I found the control interface is a simple two wire I2C. There are three power rails 2.5, 3.3 and 4V. The main ISP also drives a 40MHz clock to the sensor.

Judging from the differential common mode and peak swing, this is again sub-LVDS. The other signals are enables to power regulators and a sensor sync output, all referenced to 3.3V.

X3 Logo and Foveon part number F20A

In the next post we’re going in a deep dive on the data fetching, row sync and image characteristics on Foveon F20A.

Full speed ahead – My new generic VDMA

In 2016 when I build my KAC-12040 camera, I wasn’t satisfied with the Xilinx VDMA IP. It closes timing only at 150MHz. It neither supports arbitrary size for a compressed stream. So I wrote my own DMA engine to exploit the full bandwidth of AXI-HP port on 7-series devices. I had managed to close timing at 200MHz at 64bit. Back then when my carrier card only supported 4 LVDS banks from that sensor, this bandwidth was more than enough for 1280×720 RAW stream at 600 FPS.

But to achieve this I had to overclock the LVDS transmitters. This led to stability issues on my engineering grade sensor at high frequencies. To circumvent I implemented sync error detector and dynamically drop bad frames. This solution is perfect at moderate overclocking. But as frequency approaches 200% the drops are so severe that the gain is meaningless. As a result, I couldn’t push further for higher horizontal resolution.

In 2017 I ruled the KAC out from my astro-imaging candidate due to image quality concerns and available of better alternative. At this point on I decided to unleash its full potential for a pure high speed camera. This would require all 8 banks of LVDS signal into 7010 FPGA. It was done by routing only a single LVDS clock into MRCC pin and dynamically figures out the phase relationship for each data banks. Additionally I can discard MSB pins on later LVDS banks since 12/14bit ADC readout is slow enough using only the first two or four banks. This strategy freed enough IOs for bank 4~7 during high speed operation.

Layout without length matching, all eye diagram and phase are decided at run time

Back

Back

Soldered PCB with a socket

Improvements

This bandwidth requires higher AXI clock frequency. The limit set by 7-series AXI port is 250MHz. It’s time that I completely rewrite my VDMA. The timing closure poses a significant challenge now, especially when we are dealing with Artix-7 C-1 fabric. But some careful analysis on my previous design exposed the critical paths to improve:

1. The routing length into the hardened ARM processor is very long and 64-bit bus can easily cause routing congestions. The solution, avoid additional combinatory logic on the data path. Have a register with its Q directly go into the ARM processor. Or another way, prepare the write data in a FIFO BRAM with its output directly connected and its read enable as the release control. I chose the second option as it’s more elegant. By setting internal DO_REG = 1 the FIFO will have one more cycle of latency but significantly improved timing.

TData/TLast directly go into ARM processor without interconnect

2. The AXI interconnect is not that good. The additional logic converting burst into 16 wastes logic. Thus I parameterized the C_BURST_SIZE and AWLEN bits correspondingly. When I can set this to 16, the port now conforms to AXI3 on Zynq 7 ARM processor and the entire AXI interconnect optimized away at IP integration.

3. Scan TLast and count burst as stream comes in and issue address write accordingly. Pipeline the logic as needed and insert double buffer (skid buffer) when necessary.

The result is quite satisfactory. 328 Flip-flops and 276 look up tables. Timing closure is effortless now.

In the meantime, I rewrote my bit concatenator block which converts arbitrary bit length into 64bit. Length of each burst is specified in TUser. This IP would cope with compressed bit stream or dynamically changed bit depth during sensor operation.

Application

Now I can replace this block in every one of my cameras. With some modification on the LVDS receiver in this KAC-12040, I can now stream 16Gbits per second. At extended row width, 3600 x 720 can stably stream at 600FPS!

This speed could also extend to some SLVS-EC sensors. Once decoded to 8bit, 8 channels yields an insane 1.85GB/s data rate.

Update 12/1 application use

Multiple channels of my VDMA serving six simultaneous MIPI streams

Successful ported my VDMA to Xilinx Ultrascale+ architecture. This application is for a 6 MIPI stream, 360 degree surround view on autonomous vehicle. Timing closure at 300MHz AXI4 clock.

Leave a message below if you are interested in this IP and its pricing!

No datasheet, No FAE, No problem! – The proper way to hack Nikon D850

Two years ago we identified the sensor inside Nikon D850 with ChipMod lab. There’s plenty justification for this sensor in astronomy. It’s the first back-illuminated full-frame CMOS mass produced. It is very fast. It supports various movie resolutions up to 4K30P and 720P at 120FPS. It also has electronic first curtain and fast enough scan rate to enable a fully silent rolling image capture mode. The chip packaging is also very compact with a single connector. And a metal frame directly attached to the sensor itself enable quick thermal dissipation. Nothing is more perfect than these.

This article will be technical in many aspects and serves as a documentary of how we approach such problem in R/D process. Let’s roll!

Initial probing and speculation

The first step we do is to separate the power rails from the signal buses. Usually these power traces are thick and connects to multiple pins to reduce resistance and impedance. This allows high current to flow through. Also many have immediate capacitors to reduce ripples. At the same time we could find the ground pins using a multimeter, as well as the control pins connected to each power regulators.

Four thick traces ending with big electrolytic capacitors are power rails

Next we can search this connector part number with some basic measurements such as pin pitch, count and connector type. This one is clearly a mezzanine connector with a middle slot. On DigiKey or Mouser we can quickly filter from their vast inventory down to just tens of few components. Then it’s really just reading datasheets and to see which one matches.

The flex cable tells more information. There’re 17 differential lanes. It means there’s one functions as a clock among 16 others for data. This extra clock lane tells no embedded clock like PCI-E or USB3 is needed and the speed should be low enough for most cost-sensitive FPGAs.

The other traces are power enable and sensor controls. Judging from typical large Sony sensors, this one is probably again SPI running in HD VD slave driving mode.

Using a sniffer board

This time Nikon designed the connector well, much too well that I can flip the flex cable and put the sensor above the main PCB. I decided to build a stacking female-male board for data logging.

This board is just a passive pass-through for most signal traces. The control buses can be tapped.

Flipping the flex cable around and exposing the sensor for easy connection

Logic analysis

As expected, this sensor shares the common SPI protocol just like IMX071 and IMX094, except with increased functionality requires 16-bit address space for more registers. In the still capture mode, line period is around 11us. This gives a whopping 15FPS in 14-bit mode, truly amazing! From here I can make some rough deduction on the clock frequency and data rate.

Because each data line contains 8256 pixels at least, this distributes 7224 bits over each lane. The minimal required frequency is 657Mbps. The supplying clock frequency is 72MHz to the sensor internal PLL. The closest multiple should be 720MHz internal and 360MHz DDR clock.

720Mbps should be OK for most FPGA I/O banks. Great!

For detailed SPI protocol, I wrote some bash and python scripts to automatically compare the setting between different modes. ISO, shutter, digital gain and region of interest are clearly defined. For electronic first curtain, IMX309AQJ appears to use internal register setting to drive the charge reset scan. This contrasts with Sony A-series with an external pulse.

Power sequencing

Another important aspect is the power sequencing. Correct supply voltage and power on timing is essential. Some mixed signal ICs even requires strict power sequencing just to prevent frying the circuit. Logic analyzer is capable of capture such sequence but capacitor charging and discharging will delay the action edge a lot. Thus in most cases a scope is desired.

Digital signal rises quickly but power rails slowly charges decoupling capacitors

There are some rails can sleep during long exposure while digital circuits are inactive. It’s beneficial to log their behavior and relative timing to other control signals as well. In all there are six voltage rails supplying this sensor module, among which three are low voltage at 1.2V for digital logic and high speed interfaces.

Layout of carrier card

With information almost complete, I can layout a carrier card to include necessary power regulators and bridge the high speed signals into FPGA. Only a single I/O bank is needed.

Carrier card in black solder mask

The carrier card is designed with the IMX309AQJ sensor mechanically centered relative to microZed SoM module. Due to board to board stack height and room constraints, most regulators are placed on the back side. I wrote a simple power sequencing logic mimicking the D850’s and verified all voltages are correct. With sensor attached and first power on, I breathe a sigh of relief. Nothing went in smoke, great!

Driving the sensor and verify the clock frequency

For fast bring up, I could duplicate a configuration register setting from liveview where sensor is free running. This could drive the high speed clock continuously for frequency measurement. There’re ways to do this without a high performing scope with only digital counting logic.

reg [15:0] freq_counter = 0;
reg toggle_sync = 0, toggle_sync1, toggle_sync2;
reg [15:0] freq_counter_prev;
always @(posedge clk_div) begin
    toggle_sync1 <= toggle_sync;
    toggle_sync2 <= toggle_sync1;

    freq_counter <= freq_counter + 1;
    if (toggle_sync2 != toggle_sync1) begin
        freq_counter <= ‘h0;
        freq_counter_prev <= freq_counter;
    end
end

reg [9:0] standard_counter = 0;
reg [15:0] freq_counter_sync;
always @(posedge s00_axi_aclk) begin
    standard_counter <= standard_counter + 1;
    freq_counter_sync <= freq_counter_prev;
    if (standard_counter == 999) begin
        standard_counter <= 0;
        toggle_sync <= ~toggle_sync;
    end
end

Frequency measurement logic

The idea is to latch the tick count at certain absolute timing intervals. The s00_axi_aclk is a standard 100MHz clock and after 1000 ticks (10us) we signal across the clock domain to latch the tick counter in target clock and also reset it. Since direct measurement can pose timing constrain issues, we need BUFR to divide it down to clk_div.

The frequency matches my guess to be 360MHz. This is a DDR clock sampling both at rising and falling edges.

Sensor stack on carrier card

Getting data with ISERDES and DMA

I do not know the relative phase relationship between this clock and data lanes. So I migrated my Dynamic Phase Alignment algorithm from KAC camera to this one. This enables the IOB to scan the transition edges for eye diagram and sample at the best possible location. In the figure below, “x” indicates where rising and falling edges happens and “-“ means eye opening region. It’s apparent that the skew is minimal between data lanes even with half inch of peak difference in trace length. I can simply set a single tap delay value for all of them in the center of data eye.

There are 16 lanes. I used a 1:4 deserialization ISERDESE2 to construct a 64-bit AXI stream. This data is then continuously fed into a DMA engine. The idea here is to make this first a high speed logic analyzer. The dumped binary file should contain all information regarding frame and row synchronization, video blanking and effective pixel data in serial format.

Sync sequence and data format

Most large format Sony sensor uses a bespoke synchronization sequence other than SAV/EAV defined by ITU. From xxd dumped of DMA binary file, I found this sensor to be no different.

First line – FFF 000 FFF 000 FFF

Other lines – FFF 000 FFF 000 000

There are no end of line sequences. Thus counters must be used in conjunction with SOL sequence detector state machine.

Viewing properly formatted image stream

With these information ready, I can now implement line and row counters for a proper VDMA. The same DMA IP I built for KAC-12040 is used to provide 1.6GB/s data rate. Properly formatted images can then be transferred into memory.

As usual, with everything in shape, I implemented the freeRTOS system to handle Ethernet command and control. Video streams are based on UDP packets. On typical POSIX system like macOS and Linux, there will be no packet drops with a direct 1G connection.

Right now the lens mount is Canon EOS. Unfortunately I do not have a adapter nor EF lens with me. Will update real images later.

Line skipping and video formats

Before thinking of functionalities, I first need to figure out operation modes. This can be done with register setting comparisons. Some simple python scripting enables such comparison.

All still modes are in 14-bit ADC readout including complete silent mode. To speed up in video, it has to sacrifice ADC resolution and readout lines. In total, I found 4 different driving modes. All video modes are 12-bit readout.

Liveview base/1920×1080/1280×720 FX/DX 60 – 24 FPS

1280×720 FX/DX 120/100FPS

4K FX 30 – 24 FPS

4K DX / Liveview Zoom mode

The first two modes are full sensor area readout with horizontal binning and vertical line skipping (subsampling). The DX 4K and magnification is 1:1 windowed readout and I can see no color aliasing. The FX 4K mode scans a larger area. This mode appears to use vertical binning instead of skipping for better quality.

Additional functionalities

With the above analysis, I can isolate the registers responsible for ISO analog gain, digital gain, exposure time and window cropping. Some of these function can be applied to other modes where they are not enabled. I combined 14-bit readout with window cropping so in some SNR critical scenarios data in a small region can be realized.

window_mode

Partial readout with lots of vertical blanking

A lot more awaits discovery!

 

Update 9/16

I played around with various movie modes and here’s some updates on the imaging sizes.

In 4K high resolution DX movie mode is running on 5520 x 3070 with 7.5us per row readout. In the FX mode this is 8352 x 2328 on 9.34us per row. There are additional 22 rows each ahead for bias calibration. This 4K is the end result of downsizing from these imaging areas. The DX mode readout area is roughly 16:9 in the APS-C crop region of this sensor. This sensor can perform 14bit ADC in 11us. At 12bit readout, most single slope ADC runs at one quarter speed. The line rate is limited not by ADC but by how fast it sends data. Limiting horizontal region is a perfect solution here. In FX readout mode the aspect ratio is doubled, closed to 32:9. If we read all 4656 lines the frame rate can only achieve 23FPS. What Sony probably did was they run ADC twice on alternative lines and vector added the value before sending off. Then the ISP downsizes on X-axis.

In the 1080P mode the resolution happens to be 1/3 in each direction, 2784 x 1854. This mode is similar to IMX071 we seen before, the sensor binned the horizontal pixels internally and then skip two rows for each row read.

Update 12/3 – Version 2 PCB

On version two I swapped the 1.2V LDO with a buck step-down converter to alleviate thermal issue on such a compact PCB. These LDO requires a 2V minimal input supply. The 1.2V output rails are for the sensor digital logic, SLVS transmitter and PLL circuits. All these draw a lot of current making LDO very inefficient. Plus digital rails are not that sensitive to power ripples compared to analog counterparts. All three now share the same source regulator. Two of these are gated by a load switch to reduce inrush current during power sequencing.

In addition, I relayed the 72MHz crystal clock cross the logic level shifter into the FPGA MRCC pin. This is for the future IMX410BQT sensor. We have this sensor in our hand but we haven’t figured out which camera it came from, presumably the new D780 DSLR. Since IMX410 in Sony A7III uses SLVS-EC, we need a reference clock for the PLL to run from. Other IOs are remapped accordingly to make room in the new layout. The reserved pads for termination resistors are removed. FPGA internal LVDS_25 DIFF_TERM works just fine for SLVS 200mV common mode.

Both sensor looks identical with the same packaging design

Design a site like this with WordPress.com
Get started