BTC FPGA Miner Challenge  Best Hashrate, Lowest power per hash
 Tila: Pending
 Palkinto: $672
 Vastaanotetut työt: 13
Kilpailun tehtävänanto
For 10 years, poor FPGA BTC mining implementations, completely missed the big picture with excessively large, slow, power hungry designs. Researchers presented dozens of papers on how to make this better, completely missing the mark. This is your chance to get it right. Read this paper https://ieeexplore.ieee.org/document/9691379, then https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9151160 and look at their Verilog here github.com/archlabnaist/DoubleCMESHA256 to get a good understanding about state of the art FPGA BTC mining with verilog. Then apply that to YOUR FORK of the old standard in https://github.com/fpgaminer/OpenSourceFPGABitcoinMiner with an updated proxy for getwork.
Clues follow to make FPGA BTC mining faster, smaller, and lower power, so that you will have REAL bragging rights for the fastest, smallest, lowest power FGPA miners. Goal >10x speed up.
1) The SHA256 compression is seeded with 256 bits of very random constants and forms a large shift register as the seed text and W[n] expansion pass results are mixed in during the next 64+64 rounds of the 2nd and 3rd SHA256 passes. The DoubleCMESHA256 paper shows you how to factor unnecessary work away for a leaner pipelined design. But it misses the big picture optimization of removing a completely wasteful group of registers doing a simple shift operation. As such the minimum fully unrolled digest requires about 256*(64+64)=32,768 registers of which 25% capture the rounds compression results and 75% are simply copying data in the shift operation. Big gains (>24,576 registers) for fixing this mistake.
2) RTL logic designers breakup large combinatorial logic delays into small pipeline chunks to match a fast clock rate necessary for external interfaces, mostly to avoid multiple clock domain crossings for high volumes of data. Routing delays combined with register setup and hold times become a significant part of each clock cycle and dominate as the designer tries to reach excessively high clock rates. Often considered Best Practice. This is however HORRIBLY WRONG when high bandwidth, low resource use, and low power are three critical optimization metrics necessary for successful reconfigurable computing projects. EVERY CLOCK on state registers burns power, so an optimal design should have combinatorial logic delays and power consumption greater than 95% or more of each clock cycle, with the least routing losses and fewest clock cycles per hash. There is a sweet spot in this combinatorial length, with additional wasted power when it becomes too long and cascaded gates oscillate with multiple uncertain changing inputs.
3) Linear expressions create unnecessary linear time delays that may not be recognized and optimized out by the tools. Because of this expressions like A + B + C + D + E + F + G + H (7 serial adder delays) should be written as (((A + B) + (C + D)) + ((E + F) + (G + H))) (3 serial adder delays) with each matched addition pair in parallel. Most synthesized arithmetic expressions are done with 32 or 42 full adder compressors (A+B+Carry) which even in tree form can still generate some uncertainty oscillations. A lower power FPGA design with 6input LUT's and a hardware carry circuit is to implement 63 or 73 full adder compressors when there are three or more sequential operators to combine in parallel ... plus carry lookahead.
4) Optimizations to extract the last few percent of bandwidth, resource, power optimization is to take word wide synthesis of expressions completely out of the verilog, and reduce each bit lane down to ANF with shared ANF product terms across all expressions. Specialized synthesis.
5) Gray code nonce, stable lower peak currents at edge
Best winner with averages from Xilinx XC7Z010, Altera 10M08 Dev Kit, GOWIN GW1NRUV9 (Tang Nano 9K) with RPI Pico controller. Weighted 10% speed, 10% power, each FPGA vendor, real 20% solo, 20% pool mining for 48 hours. Your Github fork.
Claim YOUR best engineer bragging rights?
Suositellut taidot
Julkinen selvennystaulu

Kilpailun järjestäjä  1 päivä sitten
Dang ... nobody finished by March 15 UTC midnight :( ... ... ... Freelancer says 15 days left ... extend into April? Suggestions?
Complete your github with sources, prebuilt RPI/fpga images, with required testing report, for peer review. Working on all three platforms. See comment and replies last month about testing report.
Step 1: Start by building https://github.com/fpgaminer/OpenSourceFPGABitcoinMiner for all three target devices, using updated mining proxy (312hrs)
Step 2: incorporate the round folding found in nalex87/VerilogSHA2561/blob/master/main.v (28hrs)
Step 3: Use floor planner on all three devices to optimize for best case hash rate, at the lowest power. (10+hrs)
Step 4: Apply additional improvements to each target device. LUT packing, worst case delay mgmt.
Step 1 and 2 is a viable entry. Likely winner best of Steps 3 and 4. 1 päivä sitten

Kilpailun järjestäjä  3 viikkoa sitten
I don't have the funds to extend this again ... we will need to find another sponsor to help, everyone contribute $12, or let it close. I'm 71, retired, and living on a small Social Security check every month. I did this to create a project for young engineers to learn a valuable skills set from ... just like I did 15 and 20 years ago. I can not choose between participants based solely on the project graphic, and have no access to their project files, unless moved to a github public open source repository. See my reply to thread.
Extend by:
3 days (+$93.88 USD)
5 days (+$125.50 USD)
7 days (+$187.75 USD)
14 days (+$250.00 USD)
21 days (+$312.25 USD)
Note: You can extend your contest as many times as required, 50% of the fee will be added to the prize money (rounded to the nearest whole number). Your contest will also be automatically upgraded to a Guaranteed contest for FREE. This is to reward freelancers for the additional work to be done and guarantee that a winner will be chosen. 3 viikkoa sitten

Kilpailun järjestäjä  3 viikkoa sitten
Maybe unofficially extend this another 17 days and then use group consensus 10 days before the automatic distribution? ... if so create your target repository, with a progress report ... add that work in progress repository URL to this thread.
Maybe a single repository appears then and takes it all by simply forking one of the starting github projects and building on all three platforms. Worst case the roughly 30 active entries get less than $20 each, if no one posts a repository. Both are pretty lame endings after 7 weeks of this project being open.
... clock is ticking ...
Your prize will be distributed to qualified participating freelancers in 27 days, 10 hours if you don't select a winner. 3 viikkoa sitten

Kilpailun järjestäjä  3 viikkoa sitten
Reply here with your Github entries URL, so the group can rank your project. I had started this contest with this:
Then apply that to YOUR FORK of the old standard in https://github.com/fpgaminer/OpenSourceFPGABitcoinMiner with an updated proxy for getwork.
 3 viikkoa sitten

Kilpailun järjestäjä  3 viikkoa sitten
Reply to this comment thread with your ranking of the top 3 entries, so we can reach group consensus.
 3 viikkoa sitten

Kilpailun järjestäjä  3 viikkoa sitten
I'm disappointed that no one wants to claim the full prize amount ... otherwise ...
It's time to choose a winning entry
Your contest has ended and freelancers can no longer enter. Please choose a winner or extend the contest if you haven't found an entry to award.
Your prize will be distributed to qualified participating freelancers in 1 month if you don't select a winner. 3 viikkoa sitten

Kilpailun järjestäjä  4 viikkoa sitten
Congratulations to teams having posted intent to complete the contest. Near the end of the contest as practical, a github project submission with sources, built images for the 3 fpga's and controller, plus a report is required. The report will describe a summary of your design methodology plus fill in key performance metrics into a spreadsheet. I will post a link to the report outline and initial spreadsheet by the end of tomorrow here in the comments, that you will need to complete for your github submission. Comments are welcome to improve this.
The spreadsheet data combined will determine the winner based on the metrics described at the end of the contest details. There will be some final adjustment of each teams data to normalize device temps and clock rates to determine fastest and lowest power design.
Everyone can verify other teams submissions. With group consensus a winner may be declared quickly. Otherwise I will test and select the winner within a week as final arbitor. 4 viikkoa sitten

Kilpailun järjestäjä  4 viikkoa sitten
Ok ... we need to bring this to a close, and determine a winner. I appreciate the comments by Abhishek, we both concur on a number of different architectural approaches teams could use to complete this contest. Each of the suggestions I have offered in these comments are from designs I did over 10 years ago using much older Xilinx Virtex parts with highly edited FpgaC netlists and constraints.
I haven't done much FPGA work since, mostly from a strong dislike of Xilinx after they forced the FpgaC project to shutdown claiming violation of IP rights from their published documentation. I included the XC7Z010 for this contest because it's a great product, and there are a number of low cost boards floating around ... especially the mining controller surplus boards.
I'm new to both Github and Freelancer contests ... each team needs to disclose their Github project entry, IE visible to all teams and myself. Can we get consensus this week? 4 viikkoa sitten

AbhishekEG
 4 viikkoa sitten
#9 & #10
 4 viikkoa sitten

Kilpailun järjestäjä  2 kuukautta sitten
This diagram is a good conceptualization tool, which highlights the 7 input adder that is central to the SHA265 compression rounds: https://commons.wikimedia.org/wiki/File:SHA2.svg#mediaviewer/File:SHA2.svg
Implementing this function:
b[i+1].a = (b[i].h + SIGMA1(b[i].e) + Ch(b[i].e, b[i].f, b[i].g) + b[i].K + b[i].W) + (SIGMA0(b[i].a) + Maj(b[i].a, b[i].b, b[i].c));
N input adders are an interesting topic, that many people ignore.
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8843959
https://www.epfl.ch/labs/lap/wpcontent/uploads/2018/05/ParandehAfsharJan08_EfficientSynthesisOfCompressorTreesOnFpgas_ASPDAC08.pdf 2 kuukautta sitten

AbhishekEG
 4 viikkoa sitten
Yes, the diagram provides a clear visual representation of the SHA256 compression function, which is used to transform a 64byte message block into a 256bit message digest. The function involves various operations, including logical functions, shifts, and modular additions, which are applied to different components of the message block.
Regarding the use of Ninput adders, this can be a useful optimization technique in FPGA design, as it can reduce the number of adder cells required for a given computation. Ninput adders can be implemented using a tree of smaller adders, which can lead to reduced power consumption and improved performance. The papers you linked discuss various techniques for efficient synthesis of compressor trees on FPGAs, which can be useful in the design of highperformance and energyefficient FPGAbased systems. 4 viikkoa sitten

Kilpailun järjestäjä  2 kuukautta sitten
With Crypto projects, using lots of small FPGA's for high performance reconfigurable computing, is a simple solution to impossible power and thermal management when using large FPGA's.
Early (and even some current) FPGA designs target traditional control and glue logic designs where only 515% of the logic is "actively switching". Dense Crypto algorithms have an average toggle rate of roughly 50% of gates, simply because the Crypto algorithms are attempting to fully randomize bits ... coin toss per gate to switch, or not switch ... to be a zero or a one ... probability of retaining previous value about 50% of the time, and toggling about 50% of the time.
Toggling consumes power to charge parasitic capacitance in gates and routing, or to shunt that charge to the ground rail. Asic miner chips face the same problem ... they are all relatively small chips, and a miner uses a lot of them, to distribute the heat across many chips.
Fast, power efficient is critical for usable mining rigs. 2 kuukautta sitten

AbhishekEG
 4 viikkoa sitten
Yes, power and thermal management are critical concerns for high performance reconfigurable computing, especially in the case of dense crypto algorithms with high toggle rates. Using small FPGAs can be a simple solution to address these issues as distributing the workload across multiple smaller FPGAs can help manage power consumption and heat dissipation. Additionally, using more efficient algorithms and hardware designs can also help minimize power consumption and heat generation. Overall, efficient power and thermal management are key considerations for designing usable mining rigs
 4 viikkoa sitten

Kilpailun järjestäjä  2 kuukautta sitten
There are more than a dozen area vs time design solutions, allowing a mix of different implementations to best fill a device to capacity for the best maxium hash rate. So lets explore architectures that may allow using idle resources in an FPGA, and/or better packing.
1) Using block ram as memories, with a simple ALU design.
2) Using LUT ram as memories, possibly dual port, with a simple ALU design.
3) Using block ram as sequencers, including main counters.
4) Using LUT ROMs as sequencers to compact control logic.
5) Factoring control logic, K memories, to feed and control multiple 'slimer' expander/compressor's.
Each of these different architectural approaches for the problem can be mixed and matched, giving multiple very different solutions, for even the smallest devices. With more than a dozen block memories, combining them with a LUT based ALU and sequencer, provides valuable hashers. Simple linear programming problem for optimal ratios.
Multiple team members help here. 2 kuukautta sitten
Näytä 1 viesti lisää
AbhishekEG
 4 viikkoa sitten
Using block RAM as memories with a simple ALU design:
Benefit: Block RAM is typically faster than LUT RAM, and can be used as scratchpad memory for intermediate results. A simple ALU design can be easy to implement and may allow for higher clock speeds.
Tradeoff: Using block RAM for memory may not be as areaefficient as using LUT RAM, since block RAM is typically larger. Additionally, a simple ALU design may not be able to take advantage of some of the more complex optimization techniques that could be applied to the BTC algorithm. 4 viikkoa sitten

AbhishekEG
 4 viikkoa sitten
Using LUT RAM as memories, possibly dual port, with a simple ALU design:
Benefit: LUT RAM can be more areaefficient than block RAM, allowing for more compact designs. Dualport LUT RAM can be useful for implementing complex memory access patterns.
Tradeoff: LUT RAM is typically slower than block RAM, and may not be able to achieve as high clock speeds. Additionally, a simple ALU design may not be able to take advantage of some of the more complex optimization techniques that could be applied to the BTC algorithm. 4 viikkoa sitten

Kilpailun järjestäjä  2 kuukautta sitten
AND ... there are other significant optimization strategies NOT provided ... use your skills. Put a good team together, pool your formal training, interests, expertise and experience. WIN YOUR bragging rights, and EARN your TOP job in the FPGA reconfigurable computing accelerator industry!
The skills learned and demonstrated in this project are extremely valuable when applied to real world algorithm implementation for FPGA accelerated data centers. This project should become a good resume builder, as reconfigurable computing emerges from research to production. And other POW algorithms for block chain can best be served with FPGA reconfigurable computing, since expensive ASIC implementations are not very flexible. BTC is just one of many to be easily FPGA implemented.
I'll take donations from other entities, vendors, and mentors to increase the prize for this contest. I'm semiretired, and the nearly $600 for this contest with fees, is the limit of my personal budget.
Suggestions? 2 kuukautta sitten

AbhishekEG
 4 viikkoa sitten
I completely agree with you. FPGA implementation of algorithms is becoming increasingly important in the field of highperformance computing, and having experience in this area can be a valuable asset for individuals seeking jobs in this field. Moreover, as you mentioned, FPGAbased implementations can be more flexible and costeffective than ASIC implementations in some cases, which is why they are often preferred for prototyping and testing new designs. This can lead to faster development cycles and ultimately better products. I think that the skills you have demonstrated in this project could be very beneficial in a variety of fields
 4 viikkoa sitten

Kilpailun järjestäjä  2 kuukautta sitten
In C:
struct buf {
unsigned int K,W,a,b,c,d,e,f,g,h;
} b[65];
with rounds looking like:
b[i+1].a = (b[i+0].h + SIGMA1(b[i+0].e) + Ch(b[i+0].e, b[i+0].f, b[i+0].g) + b[i+0].K + b[i+0].W) +
(SIGMA0(b[i+0].a) + Maj(b[i+0].a, b[i+0].b, b[i+0].c));
b[i+1].b = b[i+0].a;
b[i+1].c = b[i+0].b;
b[i+1].d = b[i+0].c;
b[i+1].e = b[i+0].d + (b[i+0].h + SIGMA1(b[i+0].e) + Ch(b[i+0].e, b[i+0].f, b[i+0].g) + b[i+0].K + b[i+0].W);
b[i+1].f = b[i+0].e;
b[i+1].g = b[i+0].f;
b[i+1].h = b[i+0].g;
In verilog, remove the registers in 3 or 7 rounds, and let the combinatorials cascade. This reduces the number of pipeline stages and registers by 75%/87.5%, lowering foot print and dynamic power. The combinatorial path is now longer, doing more work per clock with a lower percentage of routing, setup, hold delays. Higher hash rate, even with a slower clock. 2 kuukautta sitten

AbhishekEG
 4 viikkoa sitten
module sha256_round(
input [31:0] K,
input [31:0] W,
input [31:0] a_in,
input [31:0] b_in,
input [31:0] c_in,
input [31:0] d_in,
input [31:0] e_in,
input [31:0] f_in,
input [31:0] g_in,
input [31:0] h_in,
output [31:0] a_out,
output [31:0] b_out,
output [31:0] c_out,
output [31:0] d_out,
output [31:0] e_out,
output [31:0] f_out,
output [31:0] g_out,
output [31:0] h_out
);
// Combinatorial logic for a_out
assign a_out = (h_in + SIGMA1(e_in) + Ch(e_in, f_in, g_in) + K + W) + SIGMA0(a_in) + Maj(a_in, b_in, c_in);
// Combinatorial logic for b_out, c_out, and d_out
assign b_out = a_in;
assign c_out = b_in;
assign d_out = c_in;
// Combinatorial logic for e_out
assign e_out = d_in + (h_in + SIGMA1(e_in) + Ch(e_in, f_in, g_in) + K + W);
// Combinatorial logic for f_out, g_out, and h_out
assign f_out = e_in;
assign g_out = f_in;
assign h_out = g_in;
endmodule 4 viikkoa sitten

AbhishekEG
 4 viikkoa sitten
With this module, you can chain together 8 of these combinational logic blocks to represent 8 rounds of SHA256. This would allow you to remove the registers in every 3 or 7 rounds and reduce the pipeline stages and registers by 75%/87.5%, lowering footprint and dynamic power.
By doing this, the combinatorial path is now longer, doing more work per clock with a lower percentage of routing, setup, and hold delays. This can result in a higher hash rate, even with a slower clock. However, it's important to note that this design may be more sensitive to timing and may require more careful timing analysis and testing. 4 viikkoa sitten

Kilpailun järjestäjä  2 kuukautta sitten
Big gains for a 10 year old, widely studied and used algorithm. The CME design reduced the Goldstrike 1 LUT count from 49,145 to 46,013 and the register count from 54,674 to 52,428 (95.5%, 4.5% net gain).
Compacting 8 rounds into 1, shrinks the compressor by about 28,672 registers, so we now have a target implementation size of 52,42828,672=23,756 registers (45.3% of CME, and 43.5% of Goldstrike1, 56.5% net gain). There is a smaller additional gain from also compacting the expander in the same 8:1 shrink.
This is a 56.5%/4.5% = 12.5x improvement over CME's effort to reduce register count.
LUT counts are not likely to be quite as substantial, but with switching to 73 compressors it should be significant, as it opens the door for packing additional logic besides the adders into LUT's. Hand packing functions should have a substantial effect on area, routing length, power, delays, and clock speed. 2 kuukautta sitten

AbhishekEG
 4 viikkoa sitten
It sounds like the CME design made significant improvements to the register and LUT counts of the algorithm, and the proposed design takes it a step further by compacting multiple rounds into one, resulting in a much smaller implementation size. The use of 73 compressors could also have a significant impact on LUT counts and allow for additional logic to be packed into them. Hand packing functions should also result in further improvements in various aspects of the design. Overall, it seems like there is a lot of potential for significant gains in efficiency and performance with this approach.
 4 viikkoa sitten

Kilpailun järjestäjä  2 kuukautta sitten
For those not familar with implementing SHA256, these are the function/macros in the expander and compressor.
#define ROTL(x, n) (((x) << (n))  ((x) >> (32  (n))))
#define ROTR(x, n) (((x) >> (n))  ((x) << (32  (n))))
#define Ch(x, y, z) ((z) ^ ((x) & ((y) ^ (z))))
#define Maj(x, y, z) (((x) & ((y)  (z)))  ((y) & (z)))
#define SIGMA0(x) (ROTR((x), 2) ^ ROTR((x), 13) ^ ROTR((x), 22))
#define SIGMA1(x) (ROTR((x), 6) ^ ROTR((x), 11) ^ ROTR((x), 25))
#define sigma0(x) (ROTR((x), 7) ^ ROTR((x), 18) ^ ((x) >> 3))
#define sigma1(x) (ROTR((x), 17) ^ ROTR((x), 19) ^ ((x) >> 10))
A good, tight, fast, low power design can implement these as manually placed IP blocks created in the floor planner, and call the IP blocks out in the Verilog rather than use word level operator synthesis in Verilog.
Likewise each pipeline round can be reduced to an IP block that is manually placed using the floor planner. This will minimize routing length delays/power.
Good P&R is exponential, NP Hard 2 kuukautta sitten

AbhishekEG
 4 viikkoa sitten
To implement these functions in hardware, one approach is to manually place IP blocks in the floor planner and call them out in the Verilog code. This can help to minimize routing length delays and power consumption by reducing the number of logic gates required to implement the functions.
 4 viikkoa sitten

AbhishekEG
 4 viikkoa sitten
Each pipeline round can also be reduced to an IP block that is manually placed using the floor planner, which can help to further optimize the design. However, the process of optimizing the design using placeandroute (P&R) tools is complex and computationally intensive, and finding a good, tight, fast, and lowpower design is an exponential problem that is known to be NPhard.
 4 viikkoa sitten

akderia22
 4 viikkoa sitten
#extended please. I am working on it
 4 viikkoa sitten

akderia22
 4 viikkoa sitten
#extended please. I am working on it
 4 viikkoa sitten

Kilpailun järjestäjä  4 viikkoa sitten
31/2 hours left .... It's been a long 7 weeks.
 4 viikkoa sitten

akderia22
 4 viikkoa sitten
#extended please. I am working on it
 4 viikkoa sitten

Kilpailun järjestäjä  4 viikkoa sitten
I'm actually traveling this week and didn't get time last night to create a document and spreadsheet template inside a github project. Suggestions welcome.
Let's quickly outline the report template here, plus a few replies due to character limits:
[centered]
Team Report: [your github/repro name]
BTC FPGA Miner Challenge
Best Hashrate, Lowest power per hash
[left]
Summary
Tang Nano 9K hash rate: 0.0 MH/sec at 0.0 MHz clock rate is 0.0 W/MH
Altera 10M08 hash rate: 0.0 MH/sec at 0.0 MHz clock rate is 0.0 W/MH
Xilinx XC7Z010 hash rate: 0.0 MH/sec at 0.0 MHz clock rate is 0.0 W/MH
Tang Nano 9K stable temp: 0.0C at 0.0CFM with [XXX] heatsink attached
Altera 10M08 stable temp: 0.0C at 0.0CFM with [XXX] heatsink attached
Xilinx XC7Z010 stable temp: 0.0C at 0.0CFM with [XXX] heatsink attached
Tang Nano 9K dynamic power: 0.0A at 0.0Volts is 0.0Watts
Altera 10M08 dynamic power: 0.0A at 0.0Volts is 0.0Watts
Xilinx XC7Z010 dynamic power: 0.0A at 0.0Volts is 0.0Watts
 4 viikkoa sitten

Kilpailun järjestäjä  1 kuukausi sitten
Report summary continues with:
Tang Nano 9K static power: 0.0A at 0.0Volts is 0.0Watts
Altera 10M08 static power: 0.0A at 0.0Volts is 0.0Watts
Xilinx XC7Z010 static power: 0.0A at 0.0Volts is 0.0Watts
Solo Mining tested with node: [Node name and IP address] with average rate of 0.0 MH/sec
Pool Mining tested with pool: [Pool name and IP address] with average rate of 0.0 MH/sec
We have verified that each device meets all setup and hold times at idle, with operation inside vendor specified worst case operating conditions for our designs. Solo and Pool Mining results are the sum of one each Tang, Altera, and Xilinx device operating concurrently from the RPI Pico W mining controller(s).
Team Lead: [Your name]
Team Members: [Team member list] 1 kuukausi sitten

Kilpailun järjestäjä  1 kuukausi sitten
main section of report will include at minimum:
Project Design Methodology
[describe over all architecture for your design common to all devices]
[describe the methods used to improve performance on the Tang Nano 9K device]
[describe the methods used to improve performance on the Altera 10M08 device]
[describe the methods used to improve performance on the Xilinx XC7Z010 device] 1 kuukausi sitten

Kilpailun järjestäjä  1 kuukausi sitten
Googling for variations Bitcoin fpga Mining gets a lot of hits. Some are pretty cool, as this has been a fun project for a lot of engineers over the years.
http://www.cs.columbia.edu/~sedwards/classes/2014/4840/reports/Halffast.pdf
http://www.cs.columbia.edu/~sedwards/classes/2014/4840/reports/Halffast.tar.gz
On GitHub this is another gem .... kramble/DE0NanoBitCoinMiner
And a lot more 1 kuukausi sitten

Kilpailun järjestäjä  1 kuukausi sitten
Is everyone doing ok for completion in 10 days, our should we extend this by a couple weeks so you can do your best possible outcome?
Ok ... extended your time, and added some more money to the prize :)
FYI: resources for the ebaz4205 board with XC7Z010
https://theokelo.co.ke/gettingstartingwithebaz4205zynq7000/
http://cholla.mmto.org/ebaz4205/
https://github.com/trebisky/ebaz4205_miner
In one of the first comments for this project, I opened the discussion about collapsing rounds to remove registers, and bring more combinatorial logic into the rounds. I had done this in a similar sha256 project back in 2012 ... here is another SHA256 designer that did a similar design in 2017:
https://github.com/nalex87/VerilogSHA2561/blob/master/main.v
My 2012 design included reordering operations, and lut packing, to optimize the worst case delay path. Moved the WK[] = W[] + K[] operation into the expander, from the compressor. Plus a few other optimizations. 1 kuukausi sitten

Kilpailun järjestäjä  2 kuukautta sitten
Other fpga mining projects from the past, may provide an idea or two for your project. Explore forks of early projects, like githubs progranism and WF2021. Most are trival forks, some have gems.
google open source bitcoin fpga mining
google fpga mining
The goal of this contest is to bring new ideas and advancements, to what is a very old project. Sometimes that's incrementally improving others work, sometimes that's thinking way outside the box and bringing creative new solutions to light.
To be able to compare submissions, and keep this from turning into a massive overclocking contest, I will test all submissions to the same or similar device limits based on the same data sheet limits for each device. Each design will be built under the fpga vendors free tool chain. This includes at minimum thermal limits, current/voltage limits, clock speed limits, with all connections meeting setup/hold time limits for your design from the place and route reports. 2 kuukautta sitten

Kilpailun järjestäjä  2 kuukautta sitten
Your successful submission requires 3 implementations using Altera, GOWIN, and Xilinx student/hobby boards:
Tang nano 9K board with GOWIN GW1NR9 FPGA (about $15 from Sipeed on AliExpress or eBay)
Intel Dev Kit EK10M08E144 with Altera 10M08 FPGA (about $52 from Mouser or eBay)
Xilinx XC7Z7010 Development Board (about $22 from Shengzhi on AliExpress)
Raspberry Pi Pico W for Mining controller (About $510 from AliExpress or eBay)
These may use a heat sink and fan for best hash rate.
Raspberry Pi Pico is the controller setting up the FPGA work, and handles the wifi communication for managing Solo or Pool work assignments. BTC mining lotto randomly gives away $100,000+ every 10 minutes, with about 100M to 1 odds every 10 minutes, or 60K to 1 every month, at the hash rate a small farm of FPGA student boards can yeild. More boards, better odds.
$22 XC7Z010 board: https://www.aliexpress.us/item/3256803866335473.html
or Digilent Arty Z710 with Xilinx XC7Z010 (about $200 from Digilent) 2 kuukautta sitten

Kilpailun järjestäjä  2 kuukautta sitten
Sipeed has offered to refund a Tang Nano 9K board for contestants that complete the contest. A small but generous sponsor offer for this contest projects developers.
Hello,
Thank you for your support for our Tang product!
We can return the Tang FPGA board fees for developers who have successfully submit your mining contest.
吴才泽 / Caesar Wu
深圳矽速科技有限公司 Shenzhen Sipeed Tech Ltd
 2 kuukautta sitten
Kuinka päästä alkuun kilpailuiden kanssa

Ilmoita kilpailusi Nopeaa ja helppoa

Vastaanota tonnikaupalla osallistumisia Ympäri maailmaa

Myönnä palkinto parhaalle työlle Lataa tiedostot  Helppoa!