If two models have the same architecture but different max_position_embeddings, can I merge them?
For example, Qwen2.5 Math and Qwen2.5 Coder share the same architecture but have different max_position_embeddings. Would merging them yield good results?
If two models have the same architecture but different max_position_embeddings, can I merge them?
For example, Qwen2.5 Math and Qwen2.5 Coder share the same architecture but have different max_position_embeddings. Would merging them yield good results?